Mission: AI and Data Economy

…use data, AI and innovation to transform the prevention, early diagnosis and treatment of chronic diseases by 2030.

Stakeholder: the public (as individual stakeholders), local government

Deprivation

In the UK a household is classed as deprived by the Office for National Statistics (2011) if it meets one or more of the following criteria:

  • Employment: Where any member of a household, who is not a full-time student, is either unemployed or long-term sick.

  • Education: No person in the household has at least Level 2 education (see highest level of qualification), and no person aged 16 to 18 is a full-time student.

  • Health and disability: Any person in the household has general health that is ‘bad’ or ‘very bad’ or has a long-term health problem.

  • Housing: The household’s accommodation is either overcrowded, with an occupancy rating -1 or less, or is in a shared dwelling, or has no central heating.

Studies show that children in the most deprived areas are nearly twice as likely to be obese when compared to those in the least deprived areas. Obesity and the associated health issues are estimated to have cost the NHS £4.2 billion in 2007 (Baker, 2019). This figure is expected to rise to reach a figure of £9.7 billion by 2050. This only encapsulates the cost the NHS and not the financial impact upon other services, such as social care and other economic consequences (Baker, 2019).

Why did we chose 10-11 year old as an indicator?

Children who are obese a that age are more likely to be obese as adults. (WHO, 2019) By 2017, in the UK, approximately one third of children between the ages of 2 and 15 were overweight or obese (Health and Social Care Information Centre, 2015). Children are becoming obese at an increasingly young age and remaining obese for longer than previous generations (Johnson et al, 2015). They are more likely to suffer from poor levels of self-esteem and be subject to bullying, in addition to suffering from consequent diseases such as heart disease and type 2 diabetes (Public Health England, 2018).

The reduction of obesity levels is crucial as obesity doubles the risk of premature death and in addition to the risk of physical diseases, individuals are likely to live with mental health conditions such as depression (Gatineau and Dent, 2011). Access to parks and green spaces has been shown to reduce mortality as well as risk of various chronic diseases.

Findings suggest that the proximity to a park may help promote population physical activity, when not hindered in access by major roadways, and has been linked to positive human health outcomes. Especially in urban areas, open spaces are valued by theorists who relate them to the social, political, and physical health of residents and communities. Some argue that high-quality, pedestrian-friendly neighborhood spaces can engender beneficial interpersonal connections. However, in deprived areas, the use of green spaces can be discouraged by either low accessibility or by anti-social behaviour which is off-putting parents from visiting parks with their children (Edwards et al., 2014; Wolch et al., 2014; Timperio et al., 2005).

To address the above issues, we therefore suggest a combined approach to evaluate the efficiency of local council’s intervention to tackle deprivation factors. For instance, in an urban context, practitioners have tended to address the issue of high rates of chronic diseases by implementing green areas that may have been better used to promote outdoor activities, to result in a reduction of obesity especially in children. However, in our preliminary investigation, we did not find a sufficiently significant relation between the presence of urban green areas (mainly parks) or air pollution index and childhood obesity, most likely due to the aggregation of data by ward or region size.

Research Question

Using Child Obesity data (10-11 years) as representation of chronic disease, can we model which deprivation factors are prevalent per area in order to inform the government of areas of efficient investment, as well as the public on how to facilitate healthy lifestyle choices?

Context

library(plotly)
library(knitr)
library(tidyverse)
library(sf)
library(here)
library(fingertipsR)
library(tmap)
tmap_mode("view")

Datasets

wards <- st_read(here("data", "raw", "Wards_December_2018_Generalised_Clipped_Boundaries_UK", "Wards_December_2018_Generalised_Clipped_Boundaries_UK.shp")) %>% 
  select(wd18cd, wd18nm, st_areasha) %>%
  st_transform(27700)
## Reading layer `Wards_December_2018_Generalised_Clipped_Boundaries_UK' from data source `C:\Users\diego\Documents\000000000000_WORK\00000_SSASM\TeamB\data\raw\Wards_December_2018_Generalised_Clipped_Boundaries_UK\Wards_December_2018_Generalised_Clipped_Boundaries_UK.shp' using driver `ESRI Shapefile'
## Simple feature collection with 9114 features and 10 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -116.1928 ymin: 5342.7 xmax: 655644.8 ymax: 1220302
## epsg (SRID):    NA
## proj4string:    +proj=tmerc +lat_0=49 +lon_0=-2 +k=0.9996012717 +x_0=400000 +y_0=-100000 +datum=OSGB36 +units=m +no_defs
obese_year6_pct <- fingertips_data(IndicatorID = 93107, AreaTypeID = 8) %>%
  filter(AreaType == "Ward")


obese_year6_geo <- wards %>%
  right_join(obese_year6_pct, by = c("wd18cd" = "AreaCode")) %>%
  select(wd18cd, wd18nm, Value)
## Warning: Column `wd18cd`/`AreaCode` joining factor and character vector,
## coercing into character vector
tm_shape(obese_year6_geo) + tm_polygons(col = "Value")

As this data is monitored on a continuous basis by the government and NHS, they would provide easy parameters for progress measures of potential new policies in promoting an active lifestyle based on where each region is deprived the most.

Analysis

We ran a regression model in order to investigate the relationship the dependent variable, which is obesity, and the independent variables which are the number of people with no qualification, the number of people with no access to a car or van, the number of households which a socially rented and the proportion of households which are classed as deprived.

We have very promising results: 0.683 R-Squared meaning our model fitted 68%.

The R squared tell us the proportion of variance in the dependent variable, which is obesity, can be explained by the independent variables. This is an overall measure of the strength of association and does not relect the extent to which any particular independent variable is associated with the dependent variable.

The coefficient column tells us how independent variables are dependent on the selected variable %obese. We can see that:

Coefficient - socialrented: 0.5720 - Deprivation_classification_of_household: 4.5348

The coefficient of the number of households which are socially rented is 0.57. The coefficient tells us the extent to which the independent variable, in this case the number of socially rented households, predicts the dependent variable which is obesity. The coefficient of the number of households which are classed as deprived is 4.53.So as the coefficient is usually capped at a value of one this means that the model has overfit. Overfitting refers to a model which models the training fata too well, Therefore the models has learnt thr detail and noise in the training data to the extent that it negatively impact the perfomance of the model. Overfitting is a common problem in machine learning and data science. To avoid overfitting, we would ideally have more time to enable cross validation to be conducted, we could also train more data or utilise a technique known as bagging which employs a relatively unconstrained model to smooth out predictions.

Your recommendation

General Recommendation: Possible measures to be adopted, depending on the major contributing factor by region:

What are the caveats?

Financial restrictions (public financial constraints) Reliance on public ‘good will’ (human constraints)

Data Recommendation:

Encourage the publication of data regarding the financial investment of local authorities into local initiatives to promote an awareness and improved knowledge of nutrition.

More congruence between datasets, for example the air pollution index data and the green space data.

There is a wealth of backdating data for both input and output variables, making it incredibly easy to measure progress of any implemented policies. As all of the used data is collected publicly regardless of this investigation, this approach is simple to conduct.

Ethics :

Ethical concerns linked to early profiling in children, potentially promoting eating disorders and bullying.